Deep learning based approach to unstructured record linkage
نویسندگان
چکیده
Purpose In the world of big data, data integration technology is crucial for maximising capability data-driven decision-making. Integrating from multiple sources drastically expands power information and allows us to address questions that are impossible answer using a single source. Record Linkage (RL) task identifying linking records describe same real object (e.g. person), it plays role in process. RL challenging, as uncommon different share unique identifier. Hence, must be matched based on comparison their corresponding values. Most existing techniques assume across structured represented by scheme (i.e. set attributes). Given increasing amount heterogeneous sources, those assumptions rather unrealistic. The purpose this paper propose novel model unstructured data. Design/methodology/approach previous work (Jurek-Loughrey, 2020), authors proposed approach application Siamese Multilayer Perceptron model. It was demonstrated method performed par with other approaches make constraining regarding This originally presented at iiWAS2020 [16] exploring new architectures Neural Network, which improves generalisation makes less sensitive parameter selection. Findings experimental results confirm Autoencoder-based architecture Network obtains better (Jurek et al. , 2020). Better have been achieved three out four sets. Furthermore, has second (hybrid) integrating Autoencoder model, more stable terms Originality/value To problem RL, presents deep learning improve Preceptron
منابع مشابه
Electre Tri-Machine Learning Approach to the Record Linkage Problem
In this short paper, the Electre Tri-Machine Learning Method, generally used to solve ordinal classification problems, is proposed for solving the Record Linkage problem. Preliminary experimental results show that, using the Electre Tri method, high accuracy can be achieved and more than 99% of the matches and nonmatches were correctly identified by the procedure.
متن کاملImplementing a Bayesian Approach to Record Linkage
The Census Coverage Measurement survey-based program estimated household population coverage of the 2010 Decennial Census. Calculating coverage estimates required linking survey person data to census enumerations. For record linkage research, we applied a Bayesian Latent Class Models approach to both 2010 coverage survey data and simulated household data. This paper presents our use of Base SAS...
متن کاملValidating Distance-Based Record Linkage with Probabilistic Record Linkage
This work compares two alternative methods for record linkage: distance based and probabilistic record linkage. It compares the performance of both approaches when data is categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results in relation to the num...
متن کاملBehavior Based Record Linkage
In this paper, we present a new record linkage approach that uses entity behavior to decide if potentially different entities are in fact the same. An entity’s behavior is extracted from a transaction log that records the actions of this entity with respect to a given data source. The core of our approach is a technique that merges the behavior of two possible matched entities and computes the ...
متن کاملSupervised learning approach for distance based record linkage as disclosure risk evaluation
In data privacy, record linkage is a well known technique to evaluate the disclosure risk of protected data. It is used to evaluate the number of linked records between a data set and its protected version. In this paper we give an overview of the work that we have been doing during the last months. We describe the development of a supervised learning method for distance-based record linkage, w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Web Information Systems
سال: 2021
ISSN: ['1744-0092', '1744-0084']
DOI: https://doi.org/10.1108/ijwis-05-2021-0058